library(glmmTMB) #for model fitting
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(DHARMa) #for residual diagnostics
library(see) #for plotting residuals
library(knitr) #for kable
library(effects) #for partial effects plots
library(ggeffects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(modelr) #for auxillary modelling functions
library(tidyverse) #for data wrangling
library(lindia) #for diagnostics of lm and glm
library(performance) #for residuals diagnostics
library(sjPlot) #for outputs
library(report) #for reporting methods/results
library(easystats) #framework for stats, modelling and visualisation
library(MASS) #for negative binomials
library(MuMIn) #for AICc
library(patchwork) #for figure layoutsGLM Part5
1 Preparations
Load the necessary libraries
2 Scenario
Here is a modified example from Quinn and Keough (2002). Day and Quinn (1989) described an experiment that examined how rock surface type affected the recruitment of barnacles to a rocky shore. The experiment had a single factor, surface type, with 4 treatments or levels: algal species 1 (ALG1), algal species 2 (ALG2), naturally bare surfaces (NB) and artificially scraped bare surfaces (S). There were 5 replicate plots for each surface type and the response (dependent) variable was the number of newly recruited barnacles on each plot after 4 weeks.
| TREAT | BARNACLE |
|---|---|
| ALG1 | 27 |
| .. | .. |
| ALG2 | 24 |
| .. | .. |
| NB | 9 |
| .. | .. |
| S | 12 |
| .. | .. |
| TREAT | Categorical listing of surface types. ALG1 = algal species 1, ALG2 = algal species 2, NB = naturally bare surface, S = scraped bare surface. |
| BARNACLE | The number of newly recruited barnacles on each plot after 4 weeks. |
3 Read in the data
As we are going to treat Treatment as a categorical predictor, we will specifically declare it as such straight after importing the data.
Rows: 20 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): TREAT
dbl (1): BARNACLE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
day (20 rows and 2 variables, 2 shown)
ID | Name | Type | Missings | Values | N
---+----------+-------------+----------+---------+----------
1 | TREAT | categorical | 0 (0.0%) | ALG1 | 5 (25.0%)
| | | | ALG2 | 5 (25.0%)
| | | | NB | 5 (25.0%)
| | | | S | 5 (25.0%)
---+----------+-------------+----------+---------+----------
2 | BARNACLE | numeric | 0 (0.0%) | [8, 33] | 20
------------------------------------------------------------
4 Exploratory data analysis
Model formula: \[ y_i \sim{} \mathcal{Pois}(\lambda_i)\\ \mu_i = \boldsymbol{\beta} \bf{X_i} \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and treatment contrasts for the effects of Treatment on barnacle recruitment.
5 Fit the model
Categorical predictors are actually treated as multiple regression. We start by dummy coding the levels into separate vectors containing only 0’s and 1’s. The following fabricated example should illustrate this process. The first column represent a response (the numbers are not important for the dummy coding and thus they are excluded from the example). Column X represents a categorical variable with three levels (A, B and C). This could be dummy coded into three dummies (one for each level of the categorical predictor).
| Y | X | Dummy 1 | Dummy 2 | Dummy 3 |
|---|---|---|---|---|
| . | A | 1 | 0 | 0 |
| . | A | 1 | 0 | 0 |
| . | B | 0 | 1 | 0 |
| . | B | 0 | 1 | 0 |
| . | C | 0 | 0 | 1 |
| . | C | 0 | 0 | 1 |
So, we could construct the linear predictor of a regression model as:
\[ log(\mu_i) = \beta_0 + \beta_1 D1 + \beta_2 D2 + \beta_3 D3 \]
however, this would be an overparameterized model as there are four effects parameters to estimate from only three chunks of data (one per level of the categorical predictor).
One option would be to drop the intercept:
\[ log(\mu_i) = \beta_1 D1 + \beta_2 D2 + \beta_3 D3 \] Now each \(\beta\) just represents the mean of each level of the categorical predictor. Although this is perfectly valid, it is not a common way to express the model as it does not include any effects (differences).
Another option is to re-parameterise this model such that the intercept represents the mean of the first level of the categorical predictor and then two additional \(\beta\) parameters represent the differences between the first level and each of the remaining levels. This parameterisation is called treatment contrasts.
When we fit treatment contrasts, the first dummy variable is populated with 1’s.
| Y | X | Dummy 1 | Dummy 2 | Dummy 3 |
|---|---|---|---|---|
| . | A | 1 | 0 | 0 |
| . | A | 1 | 0 | 0 |
| . | B | 1 | 1 | 0 |
| . | B | 1 | 1 | 0 |
| . | C | 1 | 0 | 1 |
| . | C | 1 | 0 | 1 |
\[ log(\mu_i) = \beta_0 + \beta_2 D2 + \beta_3 D3 \]
Before actually fitting the model for our actual data, it might be instructive to estimate the effects via matrix multiplication so as to see the treatment contrasts in action. Note, this approach is essentially Ordinary Least Squares regression, as and as such, assumes normality and homogeneity of variance. As the response observations are counts, a Gaussian distribution might bot be the most appropriate choice. Nevertheless, it will suffice for the purpose of illustration.
#Effects model
## start by dummy coding the categorical predictor and expressing
## the treatment effects as a model matrix.
Xmat <- model.matrix(~TREAT, data=day)
Xmat |> head() (Intercept) TREATALG2 TREATNB TREATS
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 1 0 0 0
6 1 1 0 0
##latex-math-preview-expression
# solve(X'X)X'Y
solve(t(Xmat) %*% Xmat) %*% t(Xmat) %*% day$BARNACLE [,1]
(Intercept) 22.4
TREATALG2 6.0
TREATNB -7.4
TREATS -9.2
Now lets fit the model. As the response observations are counts, a Poisson distribution would be an obvious choice for the model distribution. That said, it might be useful to fit the model with a Gaussian distribution with an Identity link (no transformation) to assist us in interpreting the estimated coefficients. TO emphasise, the Gaussian model will only be used as a learning aid, as a model, it does not have as much merit as a Poisson model.
6 Model validation
Conclusion:
- there is not much alarming here
- note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
Conclusion:
- there is not much alarming here
- note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
Influence measures of
glm(formula = BARNACLE ~ TREAT, family = "gaussian", data = day) :
dfb.1_ dfb.TREATA dfb.TREATN dfb.TREATS dffit cov.r cook.d hat inf
1 6.05e-01 -4.28e-01 -4.28e-01 -4.28e-01 0.6053 1.115 0.08900 0.2
2 -4.38e-01 3.10e-01 3.10e-01 3.10e-01 -0.4378 1.326 0.04862 0.2
3 -5.77e-01 4.08e-01 4.08e-01 4.08e-01 -0.5766 1.152 0.08143 0.2
4 7.54e-02 -5.33e-02 -5.33e-02 -5.33e-02 0.0754 1.608 0.00151 0.2
5 3.31e-01 -2.34e-01 -2.34e-01 -2.34e-01 0.3313 1.442 0.02843 0.2
6 7.16e-17 -4.08e-01 -5.34e-17 -4.47e-17 -0.5766 1.152 0.08143 0.2
7 -7.51e-17 4.28e-01 5.60e-17 4.70e-17 0.6053 1.115 0.08900 0.2
8 2.19e-17 -1.25e-01 -1.63e-17 -1.37e-17 -0.1766 1.565 0.00824 0.2
9 3.79e-17 -2.16e-01 -2.82e-17 -2.37e-17 -0.3051 1.467 0.02423 0.2
10 -5.77e-17 3.29e-01 4.30e-17 3.61e-17 0.4650 1.293 0.05451 0.2
11 1.47e-17 -4.15e-17 -5.78e-01 0.00e+00 -0.8180 0.839 0.15141 0.2
12 4.54e-18 -1.28e-17 -1.79e-01 0.00e+00 -0.2533 1.512 0.01682 0.2
13 -4.54e-18 1.28e-17 1.79e-01 0.00e+00 0.2533 1.512 0.01682 0.2
14 2.25e-18 -6.38e-18 -8.90e-02 0.00e+00 -0.1259 1.591 0.00421 0.2
15 -1.77e-17 5.00e-17 6.98e-01 0.00e+00 0.9866 0.643 0.20609 0.2
16 -1.12e-18 -4.95e-18 8.12e-18 -1.07e-01 -0.1512 1.579 0.00606 0.2
17 -5.15e-18 -2.27e-17 3.73e-17 -4.91e-01 -0.6937 0.998 0.11373 0.2
18 1.69e-18 7.46e-18 -1.22e-17 1.61e-01 0.2276 1.532 0.01363 0.2
19 7.06e-18 3.12e-17 -5.11e-17 6.73e-01 0.9515 0.681 0.19448 0.2
20 -2.07e-18 -9.14e-18 1.50e-17 -1.97e-01 -0.2791 1.490 0.02036 0.2
Influence measures of
glm(formula = BARNACLE ~ TREAT, family = poisson(link = "log"), data = day) :
dfb.1_ dfb.TREATA dfb.TREATN dfb.TREATS dffit cov.r cook.d hat inf
1 5.08e-01 -3.80e-01 -3.22e-01 -3.09e-01 0.5078 1.240 0.07380 0.2
2 -3.93e-01 2.94e-01 2.49e-01 2.39e-01 -0.3929 1.377 0.04032 0.2
3 -5.20e-01 3.89e-01 3.30e-01 3.17e-01 -0.5203 1.224 0.06752 0.2
4 6.59e-02 -4.93e-02 -4.17e-02 -4.01e-02 0.0659 1.611 0.00126 0.2
5 2.84e-01 -2.13e-01 -1.80e-01 -1.73e-01 0.2844 1.486 0.02358 0.2
6 -3.01e-17 -3.02e-01 2.01e-17 3.53e-17 -0.4548 1.305 0.05326 0.2
7 2.98e-17 2.99e-01 -1.99e-17 -3.50e-17 0.4508 1.310 0.05821 0.2
8 -9.16e-18 -9.20e-02 6.13e-18 1.08e-17 -0.1386 1.585 0.00539 0.2
9 -1.59e-17 -1.60e-01 1.06e-17 1.86e-17 -0.2403 1.522 0.01585 0.2
10 2.32e-17 2.33e-01 -1.55e-17 -2.72e-17 0.3511 1.422 0.03565 0.2
11 -2.42e-17 5.04e-17 -7.58e-01 3.04e-17 -0.9794 0.651 0.18750 0.2
12 -6.90e-18 1.43e-17 -2.16e-01 8.65e-18 -0.2787 1.491 0.02083 0.2
13 6.59e-18 -1.37e-17 2.06e-01 -8.26e-18 0.2663 1.501 0.02083 0.2
14 -3.38e-18 7.04e-18 -1.06e-01 4.24e-18 -0.1366 1.586 0.00521 0.2
15 2.45e-17 -5.10e-17 7.66e-01 -3.07e-17 0.9896 0.640 0.25521 0.2
16 -1.10e-18 -3.52e-18 -1.01e-17 -1.39e-01 -0.1758 1.566 0.00852 0.2
17 -5.54e-18 -1.78e-17 -5.08e-17 -7.03e-01 -0.8869 0.756 0.16004 0.2
18 1.59e-18 5.11e-18 1.46e-17 2.02e-01 0.2552 1.511 0.01918 0.2
19 6.41e-18 2.06e-17 5.88e-17 8.14e-01 1.0265 0.601 0.27367 0.2
20 -2.06e-18 -6.62e-18 -1.89e-17 -2.62e-01 -0.3301 1.443 0.02865 0.2
Not enough model terms in the conditional part of the model to check for
multicollinearity.
Conclusion:
- there is not much alarming here
- note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
Not enough model terms in the conditional part of the model to check for
multicollinearity.
Conclusion:
- there is not much alarming here
- note, there is no value in displaying leverage and leverage based diagnostics since leverage does not make much sense from categorical predictors. An observation must be in one of the levels - they cannot really be an outlier in this dimension.
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.84683, p-value = 0.68
alternative hypothesis: two.sided
Conclusion:
- there is not much alarming here
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.81979, p-value = 0.68
alternative hypothesis: two.sided
Conclusion:
- there is not much alarming here
[1] 0
[1] 18.575
Conclusion:
- this is of little value as it is not really possible for a Gaussian model to be over-dispersed. Lets ignore this…
Conclusion:
- the various diagnostics suggest that the Poisson model is likely to be reliable.
- on the basis of the diagnostics, the Gaussian model would also be reliable. Nevertheless, we will only pursue the Gaussian model for the purpose of illustrating and interpreting the treatment effects and not because it is an appropriate model.
7 Partial plots
Predicted values are always on the response scale
Predicted values are always on the response scale
8 Model investigation / hypothesis testing
Call:
glm(formula = BARNACLE ~ TREAT, family = "gaussian", data = day)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.400 1.927 11.622 3.27e-09 ***
TREATALG2 6.000 2.726 2.201 0.04275 *
TREATNB -7.400 2.726 -2.715 0.01530 *
TREATS -9.200 2.726 -3.375 0.00386 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 18.575)
Null deviance: 1033.8 on 19 degrees of freedom
Residual deviance: 297.2 on 16 degrees of freedom
AIC: 120.73
Number of Fisher Scoring iterations: 2
Conclusions:
- the estimated y-intercept (mean number of newly recruited barnacles in the ALG1 group) is 22.4.
- there is expected to be 6 more newly recruited barnacles on the ALG2 substrate than the ALG1 substrate.
- there is expected to be -7.4 and -9.2 fewer newly recruited barnacles on the NB and S substrates respectively than the ALG1 substrate.
# A tibble: 4 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.4 1.93 11.6 0.00000000327 18.6 26.2
2 TREATALG2 6 2.73 2.20 0.0427 0.658 11.3
3 TREATNB -7.4 2.73 -2.71 0.0153 -12.7 -2.06
4 TREATS -9.2 2.73 -3.38 0.00386 -14.5 -3.86
# warning this is only appropriate for html output
day_glm |> sjPlot::tab_model(show.se=TRUE, show.aic=TRUE)| BARNACLE | ||||
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 22.40 | 1.93 | 18.62 – 26.18 | <0.001 |
| TREAT [ALG2] | 6.00 | 2.73 | 0.66 – 11.34 | 0.028 |
| TREAT [NB] | -7.40 | 2.73 | -12.74 – -2.06 | 0.007 |
| TREAT [S] | -9.20 | 2.73 | -14.54 – -3.86 | 0.001 |
| Observations | 20 | |||
| R2 | 0.713 | |||
| AIC | 120.731 | |||
| Parameter | Coefficient | SE | 95% CI | t(16) | p |
|---|---|---|---|---|---|
| (Intercept) | 22.40 | 1.93 | (18.62, 26.18) | 11.62 | < .001 |
| TREAT (ALG2) | 6.00 | 2.73 | (0.66, 11.34) | 2.20 | 0.028 |
| TREAT (NB) | -7.40 | 2.73 | (-12.74, -2.06) | -2.71 | 0.007 |
| TREAT (S) | -9.20 | 2.73 | (-14.54, -3.86) | -3.38 | < .001 |
Call:
glm(formula = BARNACLE ~ TREAT, family = poisson(link = "log"),
data = day)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.10906 0.09449 32.903 < 2e-16 ***
TREATALG2 0.23733 0.12638 1.878 0.060387 .
TREATNB -0.40101 0.14920 -2.688 0.007195 **
TREATS -0.52884 0.15518 -3.408 0.000654 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 54.123 on 19 degrees of freedom
Residual deviance: 17.214 on 16 degrees of freedom
AIC: 120.34
Number of Fisher Scoring iterations: 4
Conclusions:
- the estimated y-intercept (mean number of newly recruited barnacles in the ALG1 group) is 3.11 (on the link scale).
- if however, we exponentiate it (to back-transform it onto the response scale), then the expected number of newly recruited barnacles on the ALG1 substrate is 22.4
- there is expected to be 0.24 (link scale) more newly recruited barnacles on the ALG2 substrate than the ALG1 substrate.
- when back-transformed (via exponentiation), this equates to a 1.27 fold increase in the number of newly recruited barnacles from ALG1 to ALG2 substrate. When expressed as a percentage, this equates to a 27% increase.
- similarly, there is expected to be -0.4 and -0.4 fewer newly recruited barnacles on the NB and S substrates respectively than the ALG1 substrate. These equate to 0.67 (-33%) and 0.59 (-41%) fold declines respectively.
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 2.917958584 3.2887098
TREATALG2 -0.009471895 0.4865512
TREATNB -0.696806649 -0.1108759
TREATS -0.837588039 -0.2280991
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 18.5034756 26.8082525
TREATALG2 0.9905728 1.6266964
TREATNB 0.4981736 0.8950498
TREATS 0.4327530 0.7960454
# A tibble: 4 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.11 0.0945 32.9 1.98e-237 2.92 3.29
2 TREATALG2 0.237 0.126 1.88 6.04e- 2 -0.00947 0.487
3 TREATNB -0.401 0.149 -2.69 7.20e- 3 -0.697 -0.111
4 TREATS -0.529 0.155 -3.41 6.54e- 4 -0.838 -0.228
# A tibble: 4 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.4 0.0945 32.9 1.98e-237 18.5 26.8
2 TREATALG2 1.27 0.126 1.88 6.04e- 2 0.991 1.63
3 TREATNB 0.670 0.149 -2.69 7.20e- 3 0.498 0.895
4 TREATS 0.589 0.155 -3.41 6.54e- 4 0.433 0.796
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 22.4000000 | 0.0944911 | 32.903208 | 0.0000000 | 18.5034756 | 26.8082525 |
| TREATALG2 | 1.2678571 | 0.1263757 | 1.877957 | 0.0603870 | 0.9905728 | 1.6266964 |
| TREATNB | 0.6696429 | 0.1492042 | -2.687664 | 0.0071954 | 0.4981736 | 0.8950498 |
| TREATS | 0.5892857 | 0.1551776 | -3.407993 | 0.0006544 | 0.4327530 | 0.7960454 |
# warning this is only appropriate for html output
day_glm1 |> sjPlot::tab_model(show.se=TRUE,show.aic=TRUE)| BARNACLE | ||||
| Predictors | Incidence Rate Ratios | std. Error | CI | p |
| (Intercept) | 22.40 | 2.12 | 18.50 – 26.81 | <0.001 |
| TREAT [ALG2] | 1.27 | 0.16 | 0.99 – 1.63 | 0.060 |
| TREAT [NB] | 0.67 | 0.10 | 0.50 – 0.90 | 0.007 |
| TREAT [S] | 0.59 | 0.09 | 0.43 – 0.80 | 0.001 |
| Observations | 20 | |||
| R2 Nagelkerke | 0.902 | |||
| AIC | 120.340 | |||
| Parameter | Log-Mean | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 3.11 | 0.09 | (2.92, 3.29) | 32.90 | < .001 |
| TREAT (ALG2) | 0.24 | 0.13 | (-9.47e-03, 0.49) | 1.88 | 0.060 |
| TREAT (NB) | -0.40 | 0.15 | (-0.70, -0.11) | -2.69 | 0.007 |
| TREAT (S) | -0.53 | 0.16 | (-0.84, -0.23) | -3.41 | < .001 |
| Parameter | IRR | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 22.40 | 2.12 | (18.50, 26.81) | 32.90 | < .001 |
| TREAT (ALG2) | 1.27 | 0.16 | (0.99, 1.63) | 1.88 | 0.060 |
| TREAT (NB) | 0.67 | 0.10 | (0.50, 0.90) | -2.69 | 0.007 |
| TREAT (S) | 0.59 | 0.09 | (0.43, 0.80) | -3.41 | < .001 |
9 Predictions
contrast ratio SE df null z.ratio p.value
ALG1 / ALG2 0.789 0.0997 Inf 1 -1.878 0.2375
ALG1 / NB 1.493 0.2228 Inf 1 2.688 0.0362
ALG1 / S 1.697 0.2633 Inf 1 3.408 0.0037
ALG2 / NB 1.893 0.2703 Inf 1 4.472 <.0001
ALG2 / S 2.152 0.3205 Inf 1 5.143 <.0001
NB / S 1.136 0.1918 Inf 1 0.757 0.8735
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log scale
## Including confidence intervals
day_glm1 |> emmeans(~TREAT, type='response') |> pairs() |> confint() contrast ratio SE df asymp.LCL asymp.UCL
ALG1 / ALG2 0.789 0.0997 Inf 0.570 1.09
ALG1 / NB 1.493 0.2228 Inf 1.018 2.19
ALG1 / S 1.697 0.2633 Inf 1.139 2.53
ALG2 / NB 1.893 0.2703 Inf 1.312 2.73
ALG2 / S 2.152 0.3205 Inf 1.467 3.15
NB / S 1.136 0.1918 Inf 0.737 1.75
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
Intervals are back-transformed from the log scale
## Confidence intervals and p-values
day_glm1 |> emmeans(~TREAT, type='response') |> pairs() |> summary(infer=TRUE) contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
ALG1 / ALG2 0.789 0.0997 Inf 0.570 1.09 1 -1.878 0.2375
ALG1 / NB 1.493 0.2228 Inf 1.018 2.19 1 2.688 0.0362
ALG1 / S 1.697 0.2633 Inf 1.139 2.53 1 3.408 0.0037
ALG2 / NB 1.893 0.2703 Inf 1.312 2.73 1 4.472 <.0001
ALG2 / S 2.152 0.3205 Inf 1.467 3.15 1 5.143 <.0001
NB / S 1.136 0.1918 Inf 0.737 1.75 1 0.757 0.8735
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
Intervals are back-transformed from the log scale
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log scale
## Lets store this for late
day.pairwise <- day_glm1 |>
emmeans(~TREAT, type='response') |>
pairs() |>
summary(infer=TRUE) |>
as.data.frame()
## Comparisons on an absolute scale
day_glm1 |> emmeans(~TREAT) |> regrid() |> pairs() |> summary(infer=TRUE) contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
ALG1 - ALG2 -6.0 3.19 Inf -14.189 2.19 -1.882 0.2356
ALG1 - NB 7.4 2.73 Inf 0.374 14.43 2.706 0.0344
ALG1 - S 9.2 2.67 Inf 2.345 16.06 3.448 0.0032
ALG2 - NB 13.4 2.95 Inf 5.831 20.97 4.548 <.0001
ALG2 - S 15.2 2.88 Inf 7.790 22.61 5.270 <.0001
NB - S 1.8 2.37 Inf -4.301 7.90 0.758 0.8733
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
P value adjustment: tukey method for comparing a family of 4 estimates
Conclusions:
- the contrasts section of the output indicates that there is evidence of:
- ALG1 has 1.49 fold (0.49) more newly recruited barnacles than the NB substrate
- ALG1 has 1.7 fold (0.7) more newly recruited barnacles than the S substrate
- ALG2 has 1.89 fold (0.89) more newly recruited barnacles than the NB substrate
- ALG2 has 2.15 fold (1.15) more newly recruited barnacles than the S substrate
- ALG1 was not found to be different to ALG2 and NB was not found to be different to S
## Comparisons on a link scale
day_glm1 |> estimate_contrasts(transform = "link", p_adjust = "tukey")No variable was specified for contrast estimation. Selecting `contrast = "TREAT"`.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | df | z | p
--------------------------------------------------------------------------
ALG1 | ALG2 | -0.24 | [-0.56, 0.09] | 0.13 | Inf | -1.88 | 0.238
ALG1 | NB | 0.40 | [ 0.02, 0.78] | 0.15 | Inf | 2.69 | 0.036
ALG1 | S | 0.53 | [ 0.13, 0.93] | 0.16 | Inf | 3.41 | 0.004
ALG2 | NB | 0.64 | [ 0.27, 1.01] | 0.14 | Inf | 4.47 | < .001
ALG2 | S | 0.77 | [ 0.38, 1.15] | 0.15 | Inf | 5.14 | < .001
NB | S | 0.13 | [-0.31, 0.56] | 0.17 | Inf | 0.76 | 0.874
Marginal contrasts estimated at TREAT
p-value adjustment method: Tukey
## Comparisons on a fractional scale
day_glm1 |> estimate_contrasts(transform = "none", p_adjust = "tukey")No variable was specified for contrast estimation. Selecting `contrast = "TREAT"`.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | df | z | p
--------------------------------------------------------------------------
ALG1 | ALG2 | -0.24 | [-0.56, 0.09] | 0.13 | Inf | -1.88 | 0.238
ALG1 | NB | 0.40 | [ 0.02, 0.78] | 0.15 | Inf | 2.69 | 0.036
ALG1 | S | 0.53 | [ 0.13, 0.93] | 0.16 | Inf | 3.41 | 0.004
ALG2 | NB | 0.64 | [ 0.27, 1.01] | 0.14 | Inf | 4.47 | < .001
ALG2 | S | 0.77 | [ 0.38, 1.15] | 0.15 | Inf | 5.14 | < .001
NB | S | 0.13 | [-0.31, 0.56] | 0.17 | Inf | 0.76 | 0.874
Marginal contrasts estimated at TREAT
p-value adjustment method: Tukey
Define your own
Compare:
- ALG1 vs ALG2
- NB vs S
- average of ALG1+ALG2 vs NB+S
| Levels | Alg1 vs Alg2 | NB vs S | Alg vs Bare |
|---|---|---|---|
| Alg1 | 1 | 0 | 0.5 |
| Alg2 | -1 | 0 | 0.5 |
| NB | 0 | 1 | -0.5 |
| S | 0 | -1 | -0.5 |
## Alg1_Alg2 NB_S Alg_Bare
## ALG1 1 0 0.5
## ALG2 -1 0 0.5
## NB 0 1 -0.5
## S 0 -1 -0.5
cmat<-(cbind('Alg1_Alg2'=c(1,-1,0,0),
'NB_S'=c(0,0,1,-1),
'Alg_Bare'=c(0.5,0.5,-0.5,-0.5)))
cmat Alg1_Alg2 NB_S Alg_Bare
[1,] 1 0 0.5
[2,] -1 0 0.5
[3,] 0 1 -0.5
[4,] 0 -1 -0.5
It is important that each of the comparisons are independent of one another. We can test this by exploring the cross products of the contrast matrix. Independent comparisons will have a cross product of 0. So if all of the values in the lower (and equivalently, the upper) triangle of the cross product matrix are 0, then all comparisons are independent. Those that are not 0, indicate a lack of independence and a need to remove one of the comparisons from the set of contrasts. Note, we can ignore the diagonal values.
Conclusions:
- all comparisons are independent
$emmeans
TREAT rate SE df asymp.LCL asymp.UCL
ALG1 22.4 2.12 Inf 18.6 27.0
ALG2 28.4 2.38 Inf 24.1 33.5
NB 15.0 1.73 Inf 12.0 18.8
S 13.2 1.62 Inf 10.4 16.8
Confidence level used: 0.95
Intervals are back-transformed from the log scale
$contrasts
contrast ratio SE df null z.ratio p.value
TREAT.Alg1_Alg2 0.789 0.0997 Inf 1 -1.878 0.0604
TREAT.NB_S 1.136 0.1918 Inf 1 0.757 0.4488
TREAT.Alg_Bare 1.792 0.1890 Inf 1 5.536 <.0001
Tests are performed on the log scale
day_glm1 |> emmeans(~TREAT, type='response') |>
contrast(method=list(TREAT=cmat)) |>
summary(infer=TRUE) contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
TREAT.Alg1_Alg2 0.789 0.0997 Inf 0.616 1.01 1 -1.878 0.0604
TREAT.NB_S 1.136 0.1918 Inf 0.816 1.58 1 0.757 0.4488
TREAT.Alg_Bare 1.792 0.1890 Inf 1.458 2.20 1 5.536 <.0001
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Tests are performed on the log scale
## what about absolute differences
day_glm1 |> emmeans(~TREAT, type='link') |>
regrid() |>
contrast(method=list(TREAT=cmat)) |>
summary(infer=TRUE) contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
TREAT.Alg1_Alg2 -6.0 3.19 Inf -12.25 0.247 -1.882 0.0598
TREAT.NB_S 1.8 2.37 Inf -2.85 6.455 0.758 0.4485
TREAT.Alg_Bare 11.3 1.99 Inf 7.40 15.195 5.686 <.0001
Confidence level used: 0.95
Conclusions:
- the number of newly recruited barnacles was not found to be significantly different between ALG1 and ALG2 substrates
- the number of newly recruited barnacles was not found to be significantly different between NB and S substrates
- the number of newly recruited barnacles was found to be significantly higher in the algael substrates than the bare substrates
- the algael substrates attracted 1.79 fold (79%) more newly recruited barnacles than the bare substrates.
If you want to express the comparisons on an absolute scale…
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
TREAT.Alg1_Alg2 -6.0 3.19 Inf -12.25 0.247 -1.882 0.0598
TREAT.NB_S 1.8 2.37 Inf -2.85 6.455 0.758 0.4485
TREAT.Alg_Bare 11.3 1.99 Inf 7.40 15.195 5.686 <.0001
Confidence level used: 0.95
10 Summary figures
TREAT rate SE df asymp.LCL asymp.UCL
ALG1 22.4 2.116601 Inf 18.61303 26.95746
ALG2 28.4 2.383275 Inf 24.09279 33.47724
NB 15.0 1.732050 Inf 11.96198 18.80960
S 13.2 1.624807 Inf 10.37047 16.80156
Confidence level used: 0.95
Intervals are back-transformed from the log scale
## A quick version
ggplot(newdata, aes(y=rate, x=TREAT)) +
geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
theme_classic()A more thorough version that also includes the Tukey’s test
g1 <- ggplot(newdata, aes(y=rate, x=TREAT)) +
geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL))+
geom_point()+
scale_x_discrete('Treatment', breaks=c('ALG1','ALG2','NB','S'),
labels=c('Algae spp 1', 'Algae spp 2', 'Naturally bare', 'Scraped bare'))+
scale_y_continuous(expression(Number~of~newly~recruited~barnacles~(cm^2)))+
theme_classic()
g2 <- day.pairwise |>
ggplot(aes(y=ratio, x=contrast)) +
geom_vline(xintercept=1, linetype='dashed') +
geom_pointrange(aes(xmin=asymp.LCL, xmax=asymp.UCL)) +
scale_y_continuous(trans=scales::log2_trans(),
breaks=scales::breaks_log(base=2)) +
theme_classic()
g2 <- day.pairwise |>
ggplot(aes(y=ratio, x=contrast)) +
geom_hline(yintercept=1, linetype='dashed') +
geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
scale_y_continuous(trans=scales::log2_trans(), breaks=scales::breaks_log(base=2)) +
coord_flip(ylim=c(0.25, 4)) +
theme_classic()
g1 + g2newdata = with(day, data.frame(TREAT=levels(TREAT)))
Xmat <- model.matrix(~TREAT, newdata)
coefs <- coef(day_glm1)
fit <- as.vector(coefs %*% t(Xmat))
se <- sqrt(diag(Xmat %*% vcov(day_glm1) %*% t(Xmat)))
q <- qt(0.975, df=df.residual(day_glm1))
newdata = newdata |>
mutate(Fit = exp(fit),
Lower=exp(fit - q*se),
Upper=exp(fit+q*se))
## A quick version
ggplot(newdata, aes(y=Fit, x=TREAT)) +
geom_pointrange(aes(ymin=Lower, ymax=Upper)) +
theme_classic()A more thorough version that also includes the Tukey’s test
g1 <- ggplot(newdata, aes(y=Fit, x=TREAT)) +
geom_pointrange(aes(ymin=Lower, ymax=Upper))+
geom_point()+
scale_x_discrete('Treatment', breaks=c('ALG1','ALG2','NB','S'),
labels=c('Algae spp 1', 'Algae spp 2', 'Naturally bare', 'Scraped bare'))+
scale_y_continuous(expression(Number~of~newly~recruited~barnacles~(cm^2)))+
theme_classic()
g2 <- day.pairwise |>
ggplot(aes(y=ratio, x=contrast)) +
geom_hline(yintercept=1, linetype='dashed') +
geom_pointrange(aes(ymin=asymp.LCL, ymax=asymp.UCL)) +
scale_y_continuous(trans=scales::log2_trans(), breaks=scales::breaks_log(base=2)) +
coord_flip(ylim=c(0.25, 4)) +
theme_classic()
g1 + g2